Method, device, and computer program product for managing storage system

ABSTRACT

A storage system comprises stripes, extents comprised in one stripe among the stripes residing on storage devices in the storage system, respectively. A failed stripe is determined among the stripes, the failed stripe comprising a group of failed extents residing on a group of failed storage devices, respectively, a number of failed storage devices in the group being less than or equal to parity width of the storage system. Distribution of the group of failed extents in the failed stripe is obtained. A rebuild parameter for rebuilding data in the failed stripe is generated based on the obtained distribution. The generated rebuild parameter is stored for rebuilding the storage system. Accordingly, a rebuild parameter generated for one failed stripe is reused for other failed stripe with the same distribution. The performance of rebuild operations may be improved, and time of rebuild operations may be reduced.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to Chinese Patent Application No.CN201911031260.4, on file at the China National Intellectual PropertyAdministration (CNIPA), having a filing date of Oct. 28, 2019, andhaving “METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR MANAGINGSTORAGE SYSTEM” as a title, the contents and teachings of which areherein incorporated by reference in their entirety.

FIELD

Various implementations of the present disclosure relate to storagemanagement, and more specifically, to a method, device and computerprogram product for building a storage system when a storage device inthe storage system fails.

BACKGROUND

With the development of data storage technology, various data storagedevices now provide users with increasingly large data storagecapability, and also their data access speed has been greatly increased.With the increase of data storage capability, users also impose higherdemands on data reliability and response time of storage systems. So farvarious data storage systems based on redundant arrays of independentdisks (RAID) have been developed to improve data reliability. When oneor more disks in a storage system fail(s), data in failed disk(s) can berecovered from other normal disk(s).

A mapped Redundant Array of Independent Disks (mapped RAID) has beendeveloped. In this mapped RAID, a disk is a logical concept and mayinclude a plurality of extents. Extents comprised in one logical diskmay be distributed across different physical storage devices in aresource pool. For a plurality of extents in one stripe of the mappedRAID, these extents are supposed to be distributed across differentphysical storage devices, so that when a physical storage device whereone extent among the plurality of extents resides fails, a rebuildoperation may be performed to recover data from a physical storagedevice where another extent resides.

Due to difference in usage state and time when each storage device comesinto service in the resource pool, one or more storage devices mightfail, and data in failed storage devices needs to be rebuilt. A rebuildoperation will involve a complex computing process. In particular, in astorage system with two or more parity data, when two or more storagedevices fail, the rebuild operation will take many computing resources.At this point, it has become a difficult technical problem regarding howto perform the rebuild operation in a more effective way.

SUMMARY

Therefore, it is desirable to develop and implement a technical solutionfor rebuilding a storage system more effectively. It is desired that thetechnical solution be compatible with an existing application system tomanage a storage system more effectively by reconstructingconfigurations of the existing storage system.

According to a first aspect of the present disclosure, a method isprovided for managing a storage system. The storage system includes aplurality of stripes, a plurality of extents comprised in one stripeamong the plurality of stripes residing on a plurality of storagedevices in the storage system, respectively. In the method, a failedstripe is determined among the plurality of stripes, the failed stripeincluding a group of failed extents residing on a group of failedstorage devices, respectively, the number of failed storage devices inthe group being less than or equal to parity width of the storagesystem. Distribution of the group of failed extents in the failed stripeis obtained. A rebuild parameter for rebuilding data in the failedstripe is generated based on the obtained distribution. The generatedrebuild parameter is stored for rebuilding the storage system.

According to a second aspect of the present disclosure, a device isprovided for managing a storage system. The storage system includes aplurality of stripes, a plurality of extents comprised in one stripeamong the plurality of stripes residing on a plurality of storagedevices in the storage system, respectively. The device includes: atleast one processor; and a memory coupled to the at least one processor,the memory having instructions stored thereon, the instructions, whenexecuted by the at least one processor, causing the device to performacts. The acts include: determining a failed stripe among the pluralityof stripes, the failed stripe including a group of failed extentsresiding on a group of failed storage devices, respectively, the numberof failed storage devices in the group being less than or equal toparity width of the storage system; obtaining distribution of the groupof failed extents in the failed stripe; generating a rebuild parameterfor rebuilding data in the failed stripe based on the obtaineddistribution; and storing the generated rebuild parameter for rebuildingthe storage system.

According to a third aspect of the present disclosure, a computerprogram product is provided. The computer program product is tangiblystored on a non-transitory computer-readable medium and includesmachine-executable instructions which are used to implement a methodaccording to the first aspect of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

Through the more detailed description in the accompanying drawings,features, advantages and other aspects of the implementations, thepresent disclosure will become more apparent. Several implementations ofthe present disclosure are illustrated schematically and are notintended to limit the present invention. In the drawings:

FIGS. 1A and 1B each show a block diagram of a storage system in which amethod of the present disclosure may be implemented;

FIG. 2 schematically shows a block diagram of an example environment inwhich a method of the present disclosure may be implemented;

FIG. 3 schematically shows a diagram of a storage resource pool in FIG.2;

FIG. 4 schematically shows a block diagram of the process for managing astorage system according to example implementations of the presentdisclosure;

FIG. 5 schematically shows a flowchart of a method for managing astorage system according to example implementations of the presentdisclosure;

FIG. 6 schematically shows a block diagram of a relationship between astripe in a storage system and a corresponding rebuild parameteraccording to example implementations of the present disclosure;

FIG. 7 schematically shows a block diagram of a mapping relation betweenan index and a rebuild parameter according to example implementations ofthe present disclosure;

FIG. 8 schematically shows a flowchart of a method for rebuilding astripe in a storage system according to example implementations of thepresent disclosure;

FIG. 9 schematically shows a block diagram of a rebuilt storage systemaccording to example implementations of the present disclosure; and

FIG. 10 schematically shows a block diagram of a device for managing astorage system according to example implementations of the presentdisclosure.

DETAILED DESCRIPTION OF IMPLEMENTATIONS

The individual features of the various embodiments, examples, andimplementations disclosed within this document can be combined in anydesired manner that makes technological sense. Furthermore, theindividual features are hereby combined in this manner to form allpossible combinations, permutations and variants except to the extentthat such combinations, permutations and/or variants have beenexplicitly excluded or are impractical. Support for such combinations,permutations and variants is considered to exist within this document.

It should be understood that the specialized circuitry that performs oneor more of the various operations disclosed herein may be formed by oneor more processors operating in accordance with specialized instructionspersistently stored in memory. Such components may be arranged in avariety of ways such as tightly coupled with each other (e.g., where thecomponents electronically communicate over a computer bus), distributedamong different locations (e.g., where the components electronicallycommunicate over a computer network), combinations thereof, and so on.

The preferred implementations of the present disclosure will bedescribed in more detail with reference to the drawings. Although thedrawings illustrate the preferred implementations of the presentdisclosure, it should be appreciated that the present disclosure can beimplemented in various ways and should not be limited to theimplementations explained herein. On the contrary, the implementationsare provided to make the present disclosure more thorough and completeand to fully convey the scope of the present disclosure to those skilledin the art.

As used herein, the term “includes” and its variants are to be read asopen-ended terms that mean “includes, but is not limited to.” The term“or” is to be read as “and/or” unless the context clearly indicatesotherwise. The term “based on” is to be read as “based at least in parton.” The terms “one example implementation” and “one implementation” areto be read as “at least one example implementation.” The term “a furtherimplementation” is to be read as “at least a further implementation.”The terms “first”, “second” and so on can refer to same or differentobjects. The following text can also include other explicit and implicitdefinitions.

In the context of the present disclosure, the storage system may be aRAID-based storage system. The RAID-based storage system may combine aplurality of storage devices into an array of disks. By providingredundant storage devices, reliability of an entire disk group is causedto significantly exceed a single storage device. RAID may offer variousadvantages over a single storage device, for example, enhancing dataintegrity, enhancing fault tolerance, increasing throughput or capacity,etc. There exist a number of RAID standards, such as RAID-1, RAID-2,RAID-3, RAID-4, RAID-5, RAID-6, RAID-10, RAID-50, etc. For more detailsabout RAID levels, those skilled in the art may refer tohttps://en.wikipedia.org/wiki/Standard_RAID_levels andhttps://en.wikipedia.org/wiki/Nested_RAID_levels, etc.

FIG. 1A schematically illustrates a schematic view of a storage system100A in which a method of the present disclosure may be implemented. Inthe storage system shown in FIG. 1A, working principles of RAID areillustrated by taking a RAID-5 (4D+1P, where 4D represents that 4storage devices are included in the storage system for storing data, and1P represents that 1 storage device is included in the storage systemfor storing parity) array that consists of five independent storagedevices (110, 112, 114, 116 and 118) as an example. It should be notedthat although five storage devices are schematically shown in FIG. 1A,in other implementations more or less storage devices may be comprisedaccording to different levels of RAID. Moreover, although FIG. 1Aillustrates stripes 120, 122, 124, . . . , 126, in other examples theRAID system may further include a different number of stripes.

In RAID, a stripe may cross a plurality of physical storage devices (forexample, the stripe 120 crosses the storage devices 110, 112, 114, 116and 118). The stripe may be simply construed as a storage area among aplurality of storage devices which satisfies a given address range. Datastored in the stripe 120 includes a plurality of parts: a data extentD00 stored in the storage device 110, a data extent D01 stored in thestorage device 112, a data extent D02 stored in the storage device 114,a data extent D03 stored in the storage device 116, and a data extent POstored in the storage device 118. In this example, the data extents D00,D01, D02 and D03 are stored data, and the data extent PO is a P parityof the stored data.

The mode of storing data in other stripes 122 and 124 is similar to thatin the stripe 120, the difference is that the parity about other datablock may be stored in another storage device than the storage device118. In this way, when one of the plurality of storage devices 110, 112,114, 116 and 118 fails, data in the failed device may be recovered fromother normal storage devices.

FIG. 1B schematically illustrates a schematic view 100B of rebuildingprocess of the storage system 100A. As shown in FIG. 1B, when onestorage device (e.g., the shaded storage device 116) fails, data may berecovered from the other storage devices 110, 112, 114 and 118 thatoperate normally. At this point, a new backup storage device 118B may beadded to RAID to replace the storage device 118. In this way, recovereddata may be written to 118B, and system rebuilding may be realized.

Note while a RAID-5 storage system including 5 storage devices (amongwhich 4 storage devices are used for storing data and 1 storage deviceis used for storing parity) has been described with reference to FIGS.1A and 1B, according to the definition of other RAID levels, there mayfurther exist a storage system including a different number of storagedevices. On the basis of the definition of RAID-6, for example, twostorage devices may be used to store parity P and Q, respectively. Inanother example, according to the definition of triple-parity RAID,three storage devices may be used to store parity P, Q and R,respectively.

With the development of distributed storage technologies, the variousstorage devices 110, 112, 114, 116 and 118 in the storage system shownin FIGS. 1A and 1B may no longer be limited to physical storage devicesbut may be virtual storage devices. For example, respective extents onthe storage device 110 may come from different physical storage devices(hereinafter referred to as storage devices for short) in the resourcepool. FIG. 2 schematically shows a block diagram of an exampleenvironment 200 in which the method of the present disclosure may beimplemented. As depicted, a storage resource pool 282 may include aplurality of physical storage devices 210, 220, 230, 240, 250, . . . ,260. At this point, storage space in the plurality of storage devicesmay be allocated to a plurality of storage systems 290, . . . , 292. Atthis point, these storage systems 290, . . . , 292 may access thestorage space in the various storage devices in the storage resourcepool 282 via a network 280.

It will be understood that when a storage device in the storage systemfails, data in the failed storage device may be rebuilt based on an XORoperation. At this point, a rebuild operation has lower complexity.Without data loss, when two storage devices in the storage system (e.g.,4D+1P+1Q storage system or 4D+1P+1Q+1R storage system) fail, then arebuild operation involves higher complexity.

Schematic implementations of the present disclosure will be describedunder an application environment that is a 4D+1P+1Q storage system. Forexample, in a RAID-6 storage system, the complexity of a rebuildoperation is O(n³), wherein n is data width in the storage system. Forexample, data width in a 4D+1P+1Q storage system is 4, so the complexityof a rebuild operation is 64; data width in a 8D+1P+1Q storage system is8, so the complexity of a rebuild operation is 512; while data width ina 16D+1P+1Q storage system is 16, so the complexity of a rebuildoperation is 4096. At this point, it has become an urgent problemregarding how to perform a rebuild operation in a more effective way.

FIG. 3 schematically shows a diagram of more information of the storageresource pool 282 as shown in FIG. 2. The resource pool 282 may includea plurality of storage devices 210, 220, 230, 240, 250, 260, . . . ,270. Each storage device may include a plurality of extents, where alegend 320 represents a free extent, a legend 322 represents an extentfor RAID stripe 1 of the storage system, a legend 324 represents anextent for RAID stripe 2 of the storage system, and a legend 326represents an extent for stripe 3 of the storage system. At this point,extents D11, D22, D33 and D44 for RAID stripe 1 are used for storingdata extents of the stripe, respectively, and extents D41 and D51 areused for storing parity P and parity Q, respectively. Extents D12, D22,D32 and D42 for RAID stripe 2 are used for storing data extents of thestripe, respectively, and extents D42 and D62 are used for storingparity P and parity Q, respectively.

As shown in FIG. 3, an address mapping 330 shows associations between astripe and addresses of extents in the stripe. For example, RAID stripe1 may include 6 extents, namely D01, D11, D21, D31, D41 and D51, whichreside on the storage devices 210, 220, 230, 240, 250 and 260,respectively. As shown in FIG. 3, specifically, extent D01 is the firstextent in the storage device 210, and extent D11 is the first extent inthe storage device 220. As shown in FIG. 3, there may exist a reservedspare portion 310 in each storage device, so that when a storage devicein the resource pool fails, an extent in the spare portion 310 in eachstorage device may be selected to rebuild various extents in the failedstorage device.

Note in FIG. 3 the 4D+1P+1Q RAID-6 storage system is taken as an exampleto illustrate how extents in various stripes are distributed over aplurality of storage systems in the resource pool. When RAID based onanother level is employed, those skilled in the art may implementconcrete details on the basis of the above described principles. Forexample, in the 8D+1P+1Q RAID-6 storage system, 8 extents in each stripemay be evenly distributed over 8 storage devices so as to ensure a loadbalance between the plurality of storage devices.

It will be understood with the use of the storage system, one or morestorage devices among the plurality of storage devices might fail, atwhich point a rebuild operation needs to be started so as to recoverdata in failed storage device(s) to normal storage device(s) for thepurpose of avoiding data loss. Technical solutions for building astorage system have been proposed. Specifically, regarding a failedstripe including two failed storage devices, a rebuild parameter forrebuilding the failed stripe may be generated based on locations offailed devices in the stripe. However, when the storage system includesa large number of to-be-rebuilt stripes, a corresponding rebuildparameter has to be generated for each stripe one by one, and a rebuildmay be performed.

Although the above technical solution can rebuild failed stripes in thestorage system, the technical solution takes a relatively long time. Ifa third failed storage device arises during a rebuild, thenunrecoverable data loss will occur in the 4D+1P+1Q storage system.Therefore, it is desirable to improve the performance of the rebuildoperation and reduce the time of the rebuild operation as much aspossible.

To solve the above drawbacks, implementations of the present disclosureprovide a method, device and computer program product for managing astorage system. Concrete implementations of the present disclosure willbe described in detail below. According to one implementation of thepresent disclosure, a method is provided for managing a storage system.In the method, a concept of a distribution of failed extents isintroduced. If failed extents in two failed stripes are distributed inthe same way, then the two failed stripes may share the same rebuildparameter. In other words, a rebuild parameter for one failed stripe maybe used for the other failed stripe.

With example implementations of the present disclosure, it isunnecessary to generate a rebuild parameter for each failed stripeincluding a failed extent one by one, but a generated rebuild parametermay be obtained directly. Regarding a given type of storage system, thenumber of types of the distribution of failed extents is rather limited,so only a limited number of rebuild parameters needs to be generated andstored, and the stored rebuild parameters may be used to rebuild thestorage system. With example implementations of the present disclosure,on the one hand, overheads of unnecessary computing resources and timeresources for repetitively generating rebuild parameters may be avoided.On the other hand, the process of the rebuild operation may be shortenedgreatly, the possibility that another storage device fails during arebuild may be reduced, and further the reliability of the storagesystem may be improved.

With reference to FIG. 4, a brief description is presented of theprocess of implementations of the present disclosure. FIG. 4schematically shows a block diagram 400 of the process for managing astorage system according to example implementations of the presentdisclosure. As depicted, RAID stripe 1 in the storage system may include6 extents, i.e., D01, D11, D21, D31, D41 and D51, among which the firstfour extents are used to store data while the last two extents are usedto store parity. The six extents reside on storage devices 210, 220,230, 240, 250 and 260, respectively. Suppose the storage devices 220 and240 fail, then at this point since the extents D11 and D31 reside on thefailed storage devices 220 and 240, respectively, the extents D11 andD31 are failed extents, and RAID stripe 1 is a failed stripe.

At this point a group of failed extents consist of two failed extentsD11 and D31, and a distribution 410 of the failed extents D11 and D31 inthe failed stripe may be determined. Suppose the 6 extents in the stripeare marked in a sequence from 0 to 5, then the failed extents D11 andD31 are the 1^(st) and 3^(rd) extents in the stripe, respectively. Thedistribution of the failed extents may be recorded as (1, 3), indicatingthat devices on which the first and third extents in the stripe residefail and that data in the first and third extents needs to be recovered.

As shown in FIG. 4, a rebuild parameter 420 may be generated for RAIDstripe 1, and subsequently the rebuild parameter 420 may be stored instorage space 430 for rebuilding the storage system. It will beunderstood that the stored rebuild parameter 420 may not only be used torebuild RAID stripe 1 in the storage system but also may be used torebuild another failed stripe with the same distribution as RAID stripe1. Returning to FIG. 3, RAID stripe 3 includes extents D03, D13, D23,D33, D43 and D53. Since a distribution of failed extents in RAID stripe3 is identical to the distribution of failed extents in RAID stripe 1,the two stripes may share the same rebuild parameter. With exampleimplementations of the present disclosure, the rebuild parameter onlyneeds to be generated for one stripe, so the efficiency of the rebuildoperation may be improved greatly.

With reference to FIG. 5, description is presented in more detail abouta method for managing a storage system. FIG. 5 schematically shows aflowchart of a method 500 for managing a storage system according toexample implementations of the present disclosure. Here the storagesystem may include a plurality of stripes, and a plurality of extents inone stripe among the plurality of stripes reside on a plurality ofstorage devices in the storage system, respectively. When a storagedevice among the plurality of storage devices fails, the method 500according to example implementations of the present disclosure may bestarted. Specifically, the method 500 may be started when the number offailed storage devices is equal to parity width of the storage system.

According to example implementations of the present disclosure, the datawidth of the storage system refers to the number of data extents in onestripe, and the parity width of the storage system refers to the numberof parity extents in one stripe. For example, in a 4D+1P+1Q storagesystem, the data width is 4, and the parity width is 1+1=2; in an8D+1P+1Q storage system, the data width is 8, and the parity width is 2.

At block 510, a failed stripe among the plurality of stripes isdetermined, and the failed stripe includes a group of failed extentsresiding on a group of failed storage devices, respectively. Here thenumber of storage devices in the group is less than or equal to paritywidth of the storage system. For example, in a 4D+1P+1Q storage system,a stripe including extents residing on two failed storage devices may befound in the plurality of stripes. It will be understood the method 500may be used to rebuild a stripe including 2 failed extents, while for astripe including only 1 failed extent, an XOR operation may be used torebuild data in the failed extent based on an existing method. Foranother example, in a storage system based on triple parity, the method500 may further be applied when 2 or 3 failed storage devices arise inthe storage system.

At block 520, distribution of the group of failed extents in the failedstripe is obtained. Specifically, the distribution may be determinedbased on locations of the group of failed extents in the failed stripe.Returning to the example of FIG. 3, RAID stripe 2 includes extents D12,D22, D32, D42, D52 and D62. Where the storage devices 220 and 240 fail,extents D12 and D32 are failed extents (the 0^(th) and 2^(nd) extents inthe stripe, respectively). Thereby, the distribution in RAID stripe 2 is(0, 2).

At block 530, a rebuild parameter for rebuilding data in the failedstripe is generated based on the obtained distribution. According toexample implementations of the present disclosure, the rebuild parametermay be generated in the following way. Description is presented belowwith reference to RAID stripe 1, here the distribution is (1, 3). Firstof all, an original matrix A may be built, the matrix including ncolumns of data (n is data width of the storage system) and n+m rows ofdata (m is parity width of the storage system). Therefore, in a 4D+1P+1Qstorage system, the original matrix A may be represented as below:

${{original}\mspace{20mu}{matrix}\mspace{14mu} A} = \begin{pmatrix}1 & 0 & 0 & 0 \\0 & 1 & 0 & 0 \\0 & 0 & 1 & 0 \\0 & 0 & 0 & 1 \\{p1} & {p2} & {p3} & {p4} \\{q1} & {q2} & {q3} & {q4}\end{pmatrix}$

The original matrix A includes two portions: the upper 4×4 matrix is adiagonal matrix, and two rows in the lower 4×2 matrix include parityp1-p4 and q1-q4 in parity extents, respectively. Since the 1^(st) and3^(rd) extents in RAID stripe 1 are failed extents, the 1^(st) and3^(rd) rows may be removed from the original matrix A to form anintermediate matrix B.

${intermediate}\mspace{14mu}{matrix}\mspace{14mu}{B = \begin{pmatrix}1 & 0 & 0 & 0 \\0 & 0 & 1 & 0 \\{p1} & {p2} & {p3} & {p4} \\{q1} & {q2} & {q3} & {q4}\end{pmatrix}}$

Subsequently, an inverse matrix C of the intermediate matrix may beobtained through an inverse operation. The process of the inverseoperation is omitted here, and the inverse matrix C may be representedas below. Data in failed extents with distribution (1, 3) may berecovered using the inverse matrix C. Thereby, a rebuild parameter mayinclude the inverse matrix C. It will be understood here the process forgenerating the inverse matrix C and the process for recovering data inthe failed stripe based on the inverse matrix C are the same as in theprior art and thus are not detailed here.

${inverse}\mspace{14mu}{matrix}\mspace{14mu}{C = \begin{pmatrix}a & b & c & d \\{x1} & {x2} & {x3} & {x4} \\e & f & g & h \\{y1} & {y2} & {y3} & {y4}\end{pmatrix}}$

At block 540, the generated rebuild parameter may be stored forrebuilding the storage system. Specifically, the inverse matrix C may beused to rebuild the storage system and thus is stored in storage space.It will be understood that an example of generating the rebuildoperation for the stripe with the distribution (1, 3) has been presentedfor the purpose of illustration, and when failed extents resideelsewhere, a corresponding rebuild parameter may further be generated.For example, regarding RAID stripe 2 as shown in FIG. 3, a distributionof failed extents in the stripe is (2, 2). At this point, rowscorresponding to the distribution of failed extents may be removed fromthe original matrix A, and an intermediate matrix B′ may be generatedbased on a similar method. Further, an inverse matrix C′ for rebuildingRAID stripe 2 may be generated based on the intermediate matrix B′. Atthis point, the generated inverse matrix C′ may be used to rebuild datain the stripe with the distribution (0, 2).

${{intermediate}\mspace{14mu}{matrix}\mspace{14mu} B^{\prime}} = \begin{pmatrix}0 & 1 & 0 & 0 \\0 & 0 & 0 & 1 \\{p1} & {p2} & {p3} & {p4} \\{q1} & {q2} & {q3} & {q4}\end{pmatrix}$

According to example implementations of the present disclosure, acorresponding failed stripe may be rebuilt using a rebuild parametergenerated for a different distribution. Specifically, a group of freeextents may be selected from a group of normal storage devices otherthan the group of failed storage devices among the plurality of storagedevices. To rebuild RAID stripe 1, two free extents may be selected fromnormal storage devices in the resource pool. It will be understood thatthe free extents selected here are supposed to reside on storage devicesother than various storage devices in the failed stripe. Subsequently,data in the group of failed extents may be rebuilt to the selected groupof free extents based on the stored rebuild parameter. For example, datain RAID stripe 1 may be rebuilt using the inverse matrix C, and data inRAID stripe 2 may be rebuilt using the inverse matrix C′.

According to example implementations of the present disclosure, since aload balance among the plurality of storage devices needs to beguaranteed, preferably free extents may be selected from storage deviceswith lower workloads. Specifically, workloads of the group of storagedevices may be determined first. Here the workload may include variousrespects. For example, the workload of a storage device may bedetermined based on storage space used in the storage device.Alternatively and/or additionally, the workload may further bedetermined based on processing resources, bandwidth and other states ofthe storage device. A predetermined condition may be set, e.g., astorage device with the lowest workload may be selected. Alternativelyand/or additionally, the predetermined condition may further includeselecting a storage device with a lower workload. A storage device withthe lowest/lower workload in the group of storage devices may bedetermined, and a free extent may be selected from the determinedstorage device.

With example implementations of the present disclosure, it may beensured that the rebuild process of the storage system may work in a waythat workloads of the plurality of storage devices are made as balancedas possible. On the one hand, a storage device with a lower workloadamong the plurality of storage devices may be used to increase responsespeed, and on the other hand, it may further be ensured wear states ofthe plurality of storage devices are as consistent as possible.

According to example implementations of the present disclosure, therebuild operation is supposed to conform to RAID standards. In otherwords, a plurality of extents in the rebuilt RAID stripe are supposed toreside on different storage devices. Therefore, it should be ensuredthat the free extent and other extents in the stripe reside on differentstorage devices, respectively. Specifically, if it is determined that astorage device with a lower workload is different from a storage devicewhere any extent in the stripe resides, then a free extent may beselected from the storage device. With example implementations of thepresent disclosure, it may be ensured that a plurality of extents in therebuilt stripe reside on different storage devices.

According to example implementations of the present disclosure, sincethe generated rebuild parameter may be shared between failed stripeswith the same distribution, the rebuild parameter may further be used torebuild data in other failed stripes with the same distribution.Specifically, a further failed stripe among the plurality of stripes maybe determined, here the further failed stripe including a further groupof failed extents residing on the group of failed storage devices, andfurther distribution of the further group of failed extents in thefurther failed stripe is identical to the distribution of the group offailed extents in the failed stripe. A further group of free extents maybe selected from the plurality of storage devices according to the abovemethod, e.g., based on a load balancing principle. Subsequently, data inthe further group of failed extents may be rebuilt to the selectedfurther group of free extents based on the stored rebuild parameter.

FIG. 6 schematically shows a block diagram 600 of a relationship betweena stripe in a storage system and a corresponding rebuild parameteraccording to example implementations of the present disclosure. Asdepicted, a plurality of stripes included in the storage system may beprocessed one by one. Suppose the storage system includes 3 stripes 620,622 and 624, at which point a corresponding rebuild parameter may begenerated or a generated rebuild parameter may be selected, according tothe type of distribution of failed extents in each stripe.

As shown in FIG. 6, a rebuild parameter list 610 may be used to storerebuild parameters generated for various types of distribution. First ofall, the first stripe 620 is processed. Distribution of the stripe 620is (1, 3), at which point a corresponding distribution parameter 1 maybe generated for the distribution (1, 3). Subsequently, a next stripe622 is processed, whose distribution is (0, 2). Since the distributionof the stripe 622 is different from that of the stripe 620, acorresponding distribution parameter 2 needs to be generated for thedistribution (0, 2). Then, a next stripe 624 is processed, whosedistribution is also (1, 3) like the distribution of the stripe 620.Therefore, the rebuild parameter 1 may be reused, and data in the stripe624 may be rebuilt based on the rebuild parameter 1.

Suppose the storage system includes 10 failed stripes with distribution(1, 3), then an inverse matrix C may be generated only during rebuildingthe first failed stripe, and the inverse matrix C may be reused for thefollowing 9 failed stripes. At this point, resource overheads forgenerating rebuild parameters are reduced to 10% of existing technicalsolutions. With example implementations of the present disclosure, theefficiency of rebuild operations may be greatly improved.

According to example implementations of the present disclosure, tofurther improve the performance of rebuild operations, an index may begenerated for the rebuild parameter based on the obtained distribution.For example, the index may be generated using locations of failedextents in the stripe as indicated by the distribution. With exampleimplementations of the present disclosure, a corresponding rebuildparameter may be found among a plurality of generated rebuild parametersmore quickly, and the performance of rebuild operations may be furtherimproved.

According to example implementations of the present disclosure, indexspace for generating the index may be determined based on data width andthe parity width of the storage system, and the index may be generatedin the index space. Here the index space refers to space occupied by theindex. Specifically, the size of the index space depends on the numberof candidate types of possible distribution of the group of failedextents in the failed stripe.

According to example implementations of the present disclosure, thenumber of candidate types of distribution of the group of failed extentsin the failed stripe may be determined based on the data width and theparity width.

It will be understood that the number of candidate types refers to thenumber of possible different distributions. For example, regarding a4D+1P+1Q storage system, its data width is 4 and its parity width is 2.Where two storage devices fail, the two failed storage devices may beany two of the 0^(th), 1^(st), 2^(nd) and 3^(rd) storage devices.Therefore,

$C_{4}^{2} = {{4 \times \frac{4 - 1}{2}} = 6}$

circumstances might arise. Capacity of the index space may be determinedbased on the number of the candidate types. Therefore, if a failedstripe in the storage system involves all of 6 circumstances, then 6rebuild parameters will be generated at the most. At this point, anindex may be built for each of the 6 rebuild parameters.

FIG. 7 schematically shows a block diagram 700 of a mapping relationshipbetween an index and a rebuild parameter according to exampleimplementations of the present disclosure. A shaded triangle in the leftof FIG. 7 shows the scope of index space. The index space includes 6indices, and each index may correspond to one rebuild parameter. Arebuild list 710 in the right of FIG. 7 shows 6 possible circumstancesof distribution, wherein the first column shows locations of failedextents involved in the distribution. For example, as shown by thesecond row from the bottom, (1, 3) means failed extents are at the1^(st) and 3^(rd) locations in the failed stripe. In another example, asshown by the last row, (2, 3) means failed extents are at the 2^(nd) and3^(rd) locations in the failed stripe, respectively.

According to example implementations of the present disclosure, therebuild parameter may be mapped to an element in the index space, atwhich point a location of the element in the index space is associatedwith the distribution of the group of failed extents in the failedstripe. With reference to FIG. 7, a one-to-one mapping relationshipbetween each element (a shaded portion) in the index space and eachrebuild parameter in the rebuild list may be built. Take elements 712and 714 for example. A mapping relationship may be built between theelement 712 (at the location (1, 3) in the index space) in the indexspace and a rebuild parameter 722 in the rebuild list, and a mappingrelationship may be built between the element 714 (at the location (2,3) in the index space) in the index space and a rebuild parameter 724 inthe rebuild list. With example implementations of the presentdisclosure, by generating the index, it is easy to manage rebuildparameters and further improve the performance of rebuild operations.

It will be understood FIG. 7 schematically shows an example of indicesaccording to example implementations of the present disclosure. In otherimplementations, an index may further be built in another way. Forexample, suppose two failed extents reside at locations i and j in thefailed stripe, then at this point an index may be determined based onFormula 1 below.

$\begin{matrix}{{index}{\left( {i,j} \right) = {{j*\frac{i - 1}{2}} + i}}} & {{Formula}\mspace{14mu} 1}\end{matrix}$

In Formula 1, both i and j are integers, i<j and i≠j. At this point,regarding the distribution (1, 3) in FIG. 7, the corresponding index (1,3)=3*(3−1)/2+1=4. At this point, the rebuild parameter 722 will be at alocation 4 in the index. In another example, regarding the distribution(2, 3) in FIG. 7, the corresponding index (2, 3) 3*(3−1)/2+2=5. At thispoint, the rebuild parameter 724 will be at a location 5 in the index.In this way, the complexity of indices may be further simplified, andthe rebuild process may be effected with less computing resources andtime overheads.

In a more general circumstance, suppose data width of the storage systemis n, and parity width is m, then at most C_(n) ^(m) distribution willarise, at which point integers 0 to C_(n) ^(m)−1 may be used as an indexof each rebuild parameter, respectively.

Description has been presented regarding how to generate, store andindex rebuild parameters. Where an index has been generated for arebuild parameter, a corresponding rebuild parameter may be foundquickly in the rebuild parameter list 720 by means of the index.According to example implementations of the present disclosure, arebuild parameter may be searched for in the generated index based onthe distribution of the group of failed extents in the failed stripe.Specifically, while rebuilding a certain failed stripe, the distribution(i,j) of the failed stripe may be determined first, and then acorresponding rebuild parameter may be found quickly in the rebuildparameter list based on the distribution (i,j). Continuing the exampleof FIG. 7, the index element 712 may be found in the index 710 based onthe distribution (1, 3), and the failed stripe may be recovered usingthe rebuild parameter that is in a mapping relationship with the indexelement 712.

According to example implementations of the present disclosure, theabove method 500 may be performed when a failed storage device appearsin the storage system. In the initial stage of the method 500, therebuild parameter list is empty, and as failed stripes with differentdistribution are discovered continuously, different rebuild parametersare generated gradually. Generated rebuild parameters may be added tothe rebuild parameter list one by one, and indices may be built.

FIG. 8 schematically shows a flowchart of a method 800 for rebuilding astripe in a storage system according to example implementations of thepresent disclosure. As depicted, at block 810, a plurality of stripes inthe storage system may be traversed to determine a group of failedstripes with predetermined distribution. For example, the process shownat block 810 may be performed each time a new rebuild parameter isgenerated for a failed stripe with the predetermined distribution. Atblock 820, a rebuild parameter may be searched for in the index based onthe predetermined distribution. Specifically, the search may be carriedout in the rebuild parameter list based on the above method. Then atblock 830, each failed stripe in the group of failed stripes determinedat block 810 may be rebuilt based on the found rebuild parameter.

It will be understood that after the rebuild operation is performed,extents comprised in the stripe will change, so the address mappingneeds to be updated based on extents currently comprised in the RAIDstripe. FIG. 9 schematically shows a block diagram 900 of an addressmapping of a rebuilt storage system according to example implementationsof the present disclosure. Suppose extents D11 and D31 in RAID stripe 1have been replaced with extents D11′ and D31′, at which point RAIDstripe 1 in the updated address mapping will include an extent denotedby a reference numeral 910. Suppose extents D12 and D32 in RAID stripe 2have been replaced with extents D12′ and D32′, at which point RAIDstripe 2 in the updated address mapping will include an extent denotedby a reference numeral 920. Suppose extents D13 and D33 in RAID stripe 3have been replaced with extents D13′ and D33′, at which point RAIDstripe 3 in the updated address mapping will include an extent denotedby a reference numeral 930.

With example implementations of the present disclosure, the updatedaddress mapping may reflect the latest mapping relationships betweenvarious RAID stripes and extents in the storage system. Subsequent dataread requests and data write requests may be served based on the updatedaddress mapping.

It will be understood that although implementations for managing astorage system have been described by taking a 4D+1P+1Q storage systemas an example, in other implementations the storage system may furtherbe a RAID-6 storage system with another data width. For example, thestorage system may be an 8D+1P+1Q, 16D+1P+1Q storage system. In anotherexample, the storage system may further be a storage system with anotherparity width.

With example implementations of the present disclosure, when the numberof failed storage devices is no more than the parity width of thestorage system, a rebuild parameter may be shared between failed stripeswith the same distribution. In this way, computing resources and timeoverheads for generating rebuild parameters may be reduced, and furtherthe rebuild efficiency may be improved.

While examples of the method according to the present disclosure havebeen described in detail with reference to FIGS. 2 to 9, description ispresented below for the implementation of a corresponding apparatus.According to example implementations of the present disclosure, anapparatus is provided for managing a storage system. The storage systemincludes a plurality of stripes, a plurality of extents comprised in onestripe among the plurality of stripes residing on a plurality of storagedevices in the storage system, respectively. The apparatus includes: adetermining module configured to determine a failed stripe among theplurality of stripes, the failed stripe including a group of failedextents residing on a group of failed storage devices, respectively, anumber of failed storage devices in the group being less than or equalto parity width of the storage system; an obtaining module configured toobtain a distribution of the group of failed extents in the failedstripe; a generating module configured to generate a rebuild parameterfor rebuilding data in the failed stripe based on the obtaineddistribution; and a storage module configured to store the generatedrebuild parameter for rebuilding the storage system.

According to example implementations of the present disclosure, thestorage module includes: an index module configured to generate an indexfor the rebuild parameter based on the obtained distribution.

According to example implementations of the present disclosure, theindex module includes: a space determining module configured todetermine index space for generating the index based on data width andthe parity width of the storage system; and an index generating moduleconfigured to generate the index in the index space.

According to example implementations of the present disclosure, thespace determining module includes: a number determining moduleconfigured to determine a number of candidate types of distribution ofthe group of failed extents in the failed stripe based on the data widthand the parity width; and a capacity determining module configured todetermine capacity of the index space based on the number of thecandidate types.

According to example implementations of the present disclosure, theindex generating module includes: a mapping module configured to map therebuild parameter to an element in the index space, a location of theelement in the index space being associated with the distribution of thegroup of failed extents in the failed stripe.

According to example implementations of the present disclosure, furthercomprised are: a selecting module configured to select a group of freeextents from a group of normal storage devices other than the group offailed storage devices among the plurality of storage devices; and arebuilding module configured to rebuild data in the group of failedextents to the selected group of free extents based on the storedrebuild parameter.

According to example implementations of the present disclosure, theselecting module includes: a load module configured to determineworkloads of the group of normal storage devices; an extent selectingmodule configured to select the group of free extents from the group ofnormal storage devices based on the determined workloads.

According to example implementations of the present disclosure, thedetermining module is further configured to determine a further failedstripe among the plurality of stripes, the further failed stripeincluding a further group of failed extents residing in the group offailed storage devices, further distribution of the further group offailed extents in the further failed stripe being identical to thedistribution of the group of failed extents in the failed stripe; theselecting module is further configured to select a further group of freeextents from the plurality of storage devices; and the rebuilding moduleis further configured to rebuild data in the further group of failedextents to the selected further group of free extents based on thestored rebuild parameter.

According to example implementations of the present disclosure, theapparatus further includes a search module configured to obtain therebuild parameter in the generated index based on the distribution ofthe group of failed extents in the failed stripe.

According to example implementations of the present disclosure, thestorage system is a storage system based on a Redundant Array ofIndependent Disks, and the parity width of the storage system includes2.

FIG. 10 schematically shows a block diagram of a device 1000 formanaging a storage system according to example implementations of thepresent disclosure. As depicted, the device 1000 includes a centralprocessing unit (CPU) 1001, which can execute various suitable actionsand processing based on the computer program instructions stored in theread-only memory (ROM) 1002 or computer program instructions loaded inthe random-access memory (RAM) 1003 from a storage unit 1008. The RAM1003 can also store all kinds of programs and data required by theoperations of the device 1000. CPU 1001, ROM 1002 and RAM 1003 areconnected to each other via a bus 1004. The input/output (I/O) interface1005 is also connected to the bus 1004.

A plurality of components in the device 1000 are connected to the I/Ointerface 1005, including: an input unit 1006, such as a keyboard, mouseand the like; an output unit 1007, e.g., various kinds of displays andloudspeakers etc.; a storage unit 1008, such as a magnetic disk andoptical disk, etc.; and a communication unit 1009, such as a networkcard, modem, wireless transceiver and the like. The communication unit1009 allows the device 1000 to exchange information/data with otherdevices via the computer network, such as Internet, and/or varioustelecommunication networks.

The above described process and treatment, such as the methods 500 and800 can also be executed by the processing unit 1001. For example, insome implementations, the methods 500 and 800 can be implemented as acomputer software program tangibly included in the machine-readablemedium, e.g., the storage unit 1008. In some implementations, thecomputer program can be partially or fully loaded and/or mounted to thedevice 1000 via ROM 1002 and/or the communication unit 1009. When thecomputer program is loaded to the RAM 1003 and executed by the CPU 1001,one or more steps of the above described methods 500 and 800 can beimplemented. Alternatively, in other implementations, the CPU 1101 alsocan be configured in other suitable manners to realize the aboveprocedure/method.

According to example implementations of the present disclosure, a deviceis provided for managing a storage system. The storage system includes aplurality of stripes, a plurality of extents comprised in one stripeamong the plurality of stripes residing on a plurality of storagedevices in the storage system, respectively. The device includes: atleast one processor; and a memory coupled to the at least one processor,the memory having instructions stored thereon, the instructions, whenexecuted by the at least one processor, causing the device to performacts. The acts include: determining a failed stripe among the pluralityof stripes, the failed stripe including a group of failed extentsresiding on a group of failed storage devices, respectively, a number offailed storage devices in the group being less than or equal to paritywidth of the storage system; obtaining distribution of the group offailed extents in the failed stripe; generating a rebuild parameter forrebuilding data in the failed stripe based on the obtained distribution;and storing the generated rebuild parameter for rebuilding the storagesystem.

According to example implementations of the present disclosure, storingthe generated rebuild parameter for rebuilding the storage systemincludes: generating an index for the rebuild parameter based on theobtained distribution.

According to example implementations of the present disclosure,generating the index for the rebuild parameter based on the obtaineddistribution includes: determining index space for generating the indexbased on data width and the parity width of the storage system; andgenerating the index in the index space.

According to example implementations of the present disclosure,determining index space for generating the index includes: determining anumber of candidate types of the distribution of the group of failedextents in the failed stripe based on the data width and the paritywidth; and determining capacity of the index space based on the numberof the candidate types.

According to example implementations of the present disclosure,generating the index in the index space includes: mapping the rebuildparameter to an element in the index space, a location of the element inthe index space being associated with the distribution of the group offailed extents in the failed stripe.

According to example implementations of the present disclosure, the actsfurther include: selecting a group of free extents from a group ofnormal storage devices other than the group of failed storage devicesamong the plurality of storage devices; and rebuilding data in the groupof failed extents to the selected group of free extents based on thestored rebuild parameter.

According to example implementations of the present disclosure,selecting the group of free extents from the group of normal storagedevices other than the group of failed storage devices among theplurality of storage devices includes: determining workloads of thegroup of normal storage devices; selecting the group of free extentsfrom the group of normal storage devices based on the determinedworkloads.

According to example implementations of the present disclosure, the actsfurther include: determining a further failed stripe among the pluralityof stripes, the further failed stripe including a further group offailed extents residing in the group of failed storage devices,respectively, further distribution of the further group of failedextents in the further failed stripe being identical to the distributionof the group of failed extents in the failed stripe; selecting a furthergroup of free extents from the plurality of storage devices; andrebuilding data in the further group of failed extents to the selectedfurther group of free extents based on the stored rebuild parameter.

According to example implementations of the present disclosure, the actsfurther include: obtaining the rebuild parameter in the generated indexbased on the distribution of the group of failed extents in the failedstripe.

According to example implementations of the present disclosure, thestorage system is a storage system based on a Redundant Array ofIndependent Disks, and the parity width of the storage system includes2.

According to example implementations of the present disclosure, there isprovided a computer program product. The computer program product istangibly stored on a non-transitory computer-readable medium andincludes machine-executable instructions which are used to implement themethod according to the present disclosure.

According to example implementations of the present disclosure, there isprovided a computer-readable medium. The computer-readable medium hasmachine-executable instructions stored thereon, the machine-executableinstructions, when executed by at least one processor, causing the atleast one processor to implement the method according to the presentdisclosure.

The present disclosure can be a method, device, system and/or computerprogram product. The computer program product can include acomputer-readable storage medium, on which the computer-readable programinstructions for executing various aspects of the present disclosure areloaded.

The computer-readable storage medium can be a tangible apparatus thatmaintains and stores instructions utilized by the instruction executingapparatuses. The computer-readable storage medium can be, but is notlimited to, an electrical storage device, magnetic storage device,optical storage device, electromagnetic storage device, semiconductorstorage device or any appropriate combinations of the above. Moreconcrete examples of the computer-readable storage medium(non-exhaustive list) include: portable computer disk, hard disk,random-access memory (RAM), read-only memory (ROM), erasableprogrammable read-only memory (EPROM or flash), static random-accessmemory (SRAM), portable compact disk read-only memory (CD-ROM), digitalversatile disk (DVD), memory stick, floppy disk, mechanical codingdevices, punched card stored with instructions thereon, or a projectionin a slot, and any appropriate combinations of the above. Thecomputer-readable storage medium utilized here is not interpreted astransient signals per se, such as radio waves or freely propagatedelectromagnetic waves, electromagnetic waves propagated via waveguide orother transmission media (such as optical pulses via fiber-opticcables), or electric signals propagated via electric wires.

The described computer-readable program instructions can be downloadedfrom the computer-readable storage medium to each computing/processingdevice, or to an external computer or external storage via Internet,local area network, wide area network and/or wireless network. Thenetwork can include copper-transmitted cable, optical fibertransmission, wireless transmission, router, firewall, switch, networkgate computer and/or edge server. The network adapter card or networkinterface in each computing/processing device receives computer-readableprogram instructions from the network and forwards the computer-readableprogram instructions for storage in the computer-readable storage mediumof each computing/processing device.

The computer program instructions for executing operations of thepresent disclosure can be assembly instructions, instructions ofinstruction set architecture (ISA), machine instructions,machine-related instructions, microcodes, firmware instructions, statesetting data, or source codes or target codes written in any combinationof one or more programming languages, wherein the programming languagesconsist of object-oriented programming languages, e.g., Smalltalk, C++and so on, and traditional procedural programming languages, such as “C”language or similar programming languages. The computer-readable programinstructions can be implemented fully on the user computer, partially onthe user computer, as an independent software package, partially on theuser computer and partially on a remote computer, or completely on theremote computer or server. In the case where a remote computer isinvolved, the remote computer can be connected to the user computer viaany type of network, including local area network (LAN) and wide areanetwork (WAN), or to the external computer (e.g., connected via Internetusing an Internet service provider). In some implementations, stateinformation of the computer-readable program instructions is used tocustomize an electronic circuit, e.g., programmable logic circuit, fieldprogrammable gate array (FPGA) or programmable logic array (PLA). Theelectronic circuit can execute computer-readable program instructions toimplement various aspects of the present disclosure.

Various aspects of the present disclosure are described here withreference to flow charts and/or block diagrams of method, apparatus(system) and computer program products according to implementations ofthe present disclosure. It should be understood that each block of theflow charts and/or block diagrams and the combination of various blocksin the flow charts and/or block diagrams can be implemented bycomputer-readable program instructions.

The computer-readable program instructions can be provided to theprocessing unit of a general-purpose computer, dedicated computer orother programmable data processing apparatuses to manufacture a machine,such that the instructions that, when executed by the processing unit ofthe computer or other programmable data processing apparatuses, generatean apparatus for implementing functions/actions stipulated in one ormore blocks in the flow chart and/or block diagram. Thecomputer-readable program instructions can also be stored in thecomputer-readable storage medium and cause the computer, programmabledata processing apparatus and/or other devices to work in a particularmanner, such that the computer-readable medium stored with instructionscontains an article of manufacture, including instructions forimplementing various aspects of the functions/actions stipulated in oneor more blocks of the flow chart and/or block diagram.

The computer-readable program instructions can also be loaded into acomputer, other programmable data processing apparatuses or otherdevices, so as to execute a series of operation steps on the computer,the other programmable data processing apparatuses or other devices togenerate a computer-implemented procedure. Therefore, the instructionsexecuted on the computer, other programmable data processing apparatusesor other devices implement functions/actions stipulated in one or moreblocks of the flow chart and/or block diagram.

The flow charts and block diagrams in the drawings illustrate systemarchitecture, functions and operations that may be implemented bysystem, method and computer program products according to a plurality ofimplementations of the present disclosure. In this regard, each block inthe flow chart or block diagram can represent a module, a part ofprogram segment or code, wherein the module and the part of programsegment or code include one or more executable instructions forperforming stipulated logic functions. In some alternativeimplementations, it should be noted that the functions indicated in theblock can also take place in an order different from the one indicatedin the drawings. For example, two successive blocks can be in factexecuted in parallel or sometimes in a reverse order depending on thefunctions involved. It should also be noted that each block in the blockdiagram and/or flow chart and combinations of the blocks in the blockdiagram and/or flow chart can be implemented by a hardware-based systemexclusively for executing stipulated functions or actions, or by acombination of dedicated hardware and computer instructions.

Various implementations of the present disclosure have been describedabove and the above description is only by way of example rather thanexhaustive and is not limited to the implementations of the presentdisclosure. Many modifications and alterations, without deviating fromthe scope and spirit of the explained various implementations, areobvious for those skilled in the art. The selection of terms in the textaims to best explain principles and actual applications of eachimplementation and technical improvements made in the market by eachimplementation, or enable others of ordinary skill in the art tounderstand implementations of the present disclosure.

I/We claim:
 1. A method for managing a storage system, the storagesystem comprising a plurality of stripes, a plurality of extentscomprised in one stripe among the plurality of stripes residing on aplurality of storage devices in the storage system, respectively, themethod comprising: determining a failed stripe among the plurality ofstripes, the failed stripe comprising a group of failed extents residingon a group of failed storage devices, respectively, a number of failedstorage devices in the group being less than or equal to parity width ofthe storage system; obtaining a distribution of the group of failedextents in the failed stripe; generating a rebuild parameter forrebuilding data in the failed stripe based on the obtained distribution;and storing the generated rebuild parameter for rebuilding the storagesystem.
 2. The method of claim 1, wherein storing the generated rebuildparameter for rebuilding the storage system comprises: generating anindex for the rebuild parameter based on the obtained distribution. 3.The method of claim 2, wherein generating the index for the rebuildparameter based on the obtained distribution comprises: determiningindex space for generating the index based on data width and the paritywidth of the storage system; and generating the index in the indexspace.
 4. The method of claim 3, wherein determining the index space forgenerating the index comprises: determining a number of candidate typesof the distribution of the group of failed extents in the failed stripebased on the data width and the parity width; and determining capacityof the index space based on the number of the candidate types.
 5. Themethod of claim 3, wherein generating the index in the index spacecomprises: mapping the rebuild parameter to an element in the indexspace, a location of the element in the index space being associatedwith the distribution of the group of failed extents in the failedstripe.
 6. The method of claim 2, further comprising: selecting a groupof free extents from a group of normal storage devices other than thegroup of failed storage devices among the plurality of storage devices;and rebuilding data in the group of failed extents to the selected groupof free extents based on the stored rebuild parameter.
 7. The method ofclaim 6, wherein selecting the group of free extents from the group ofnormal storage devices other than the group of failed storage devicesamong the plurality of storage devices comprises: determining workloadsof the group of normal storage devices; and selecting the group of freeextents from the group of normal storage devices based on the determinedworkloads.
 8. The method of claim 2, further comprising: determining afurther failed stripe among the plurality of stripes, the further failedstripe comprising a further group of failed extents residing in thegroup of failed storage devices, respectively, further distribution ofthe further group of failed extents in the further failed stripe beingidentical to the distribution of the group of failed extents in thefailed stripe; selecting a further group of free extents from theplurality of storage devices; and rebuilding data in the further groupof failed extents to the selected further group of free extents based onthe stored rebuild parameter.
 9. The method of claim 8, furthercomprising: obtaining the rebuild parameter in the generated index basedon the distribution of the group of failed extents in the failed stripe.10. The method of claim 1, wherein the storage system is a storagesystem based on a Redundant Array of Independent Disks, and the paritywidth of the storage system comprises
 2. 11. A device for managing astorage system, the storage system comprising a plurality of stripes, aplurality of extents comprised in one stripe among the plurality ofstripes residing on a plurality of storage devices in the storagesystem, respectively, the device comprising: at least one processor; anda memory coupled to the at least one processor and having instructionsstored thereon, the instructions, when executed by the at least oneprocessor, causing the device to perform acts, including: determining afailed stripe among the plurality of stripes, the failed stripecomprising a group of failed extents residing on a group of failedstorage devices, respectively, a number of failed storage devices in thegroup being less than or equal to parity width of the storage system;obtaining a distribution of the group of failed extents in the failedstripe; generating a rebuild parameter for rebuilding data in the failedstripe based on the obtained distribution; and storing the generatedrebuild parameter for rebuilding the storage system.
 12. The device ofclaim 11, wherein storing the generated rebuild parameter for rebuildingthe storage system comprises: generating an index for the rebuildparameter based on the obtained distribution.
 13. The device of claim12, wherein generating the index for the rebuild parameter based on theobtained distribution comprises: determining index space for generatingthe index based on data width and the parity width of the storagesystem; and generating the index in the index space.
 14. The device ofclaim 13, wherein determining index space for generating the indexcomprises: determining a number of candidate types of the distributionof the group of failed extents in the failed stripe based on the datawidth and the parity width; and determining capacity of the index spacebased on the number of the candidate types.
 15. The device of claim 13,wherein generating the index in the index space comprises: mapping therebuild parameter to an element in the index space, a location of theelement in the index space being associated with the distribution of thegroup of failed extents in the failed stripe.
 16. The device of claim12, wherein the acts further comprise: selecting a group of free extentsfrom a group of normal storage devices other than the group of failedstorage devices among the plurality of storage devices; and rebuildingdata in the group of failed extents to the selected group of freeextents based on the stored rebuild parameter.
 17. The device of claim16, wherein selecting a group of free extents from the group of normalstorage devices other than the group of failed storage devices among theplurality of storage devices comprises: determining workloads of thegroup of normal storage devices; and selecting the group of free extentsfrom the group of normal storage devices based on the determinedworkloads.
 18. The device of claim 12, wherein the acts furthercomprise: determining a further failed stripe among the plurality ofstripes, the further failed stripe comprising a further group of failedextents residing in the group of failed storage devices, respectively,further distribution of the further group of failed extents in thefurther failed stripe being identical to the distribution of the groupof failed extents in the failed stripe; selecting a further group offree extents from the plurality of storage devices; and rebuilding datain the further group of failed extents to the selected further group offree extents based on the stored rebuild parameter.
 19. The device ofclaim 18, wherein the acts further comprise: obtaining the rebuildparameter in the generated index based on the distribution of the groupof failed extents in the failed stripe.
 20. A computer program producthaving a non-transitory computer readable medium which stores a set ofinstructions to manage a storage system comprising a plurality ofstripes, a plurality of extents comprised in one stripe among theplurality of stripes residing on a plurality of storage devices in thestorage system, respectively; the set of instructions, when carried outby computerized circuitry, causing the computerized circuitry to performa method of: determining a failed stripe among the plurality of stripes,the failed stripe comprising a group of failed extents residing on agroup of failed storage devices, respectively, a number of failedstorage devices in the group being less than or equal to parity width ofthe storage system; obtaining a distribution of the group of failedextents in the failed stripe; generating a rebuild parameter forrebuilding data in the failed stripe based on the obtained distribution;and storing the generated rebuild parameter for rebuilding the storagesystem.